Shaken-Not-Stirred Technology Mixed Right!

Brett Wynkoop
wynkoop--AT--tekhq.com

Subscribe RSS


This site's referrers
shaken-not-stirred.tekhq.com:80 - 2445 hits
t.co - 187 hits
reddit.com - 102 hits
hempcbdoilww.com - 95 hits
moykrest.ru - 87 hits

Brooklyn Repertory Opera

New Yorkers for Fair Use

Brooklyn On Line

Resume of Brett Wynkoop

Brett Wynkoop's GPG public key


Stand Alone Sysadmin

1 2 3
Something Simple    
The best Unix Systems Administrators find ways to make systems administration simple and less prone to human error. One simple step I take to make my life easier and leave less chance for errors is to change the GECOS field for root on systems I administer.

Your typical BSD root password file entry looks like:

root:*:0:0:Charlie &:/root:/bin/csh

Which leaves the user's real name from the GECOS field to expand to Charlie Root. What is wrong with that you ask?
We all know that Unix systems use email to communicate status and error information to the Systems Administrator. When you get an email from "Charlie Root" it gives you no clue what system generated the email when looking at the table of contents of your mailbox using many modern Mail User Agents. That means you need to take the extra step of
opening the mail and looking carefully to determine what system it is from.

If we simple changed the GECOS field to something more descriptive we find out what system has the problem even before we open the mail, and we might with that information as well as the subject display have a good idea of what is going on. So I set the GECOS field for root as in the example below:

root:*:0:0:root@box1.example.com:/root:/bin/csh

This causes the name displayed by my MUA to show as root@box1.example.com instead of Charlie Root. This makes my life easier, and I suspect it will do the same for you. Give it a try.

-Brett Wynkoop
wynkoop--at--wynn.com


No SPF record is better than a broken SPF record    
To SPF or not SPF that is the question.

Ring....Ring....Ring..

Me: Good day.

Headhunter: I sent you an email and it bounced back. Do you have a working email address you can give me.

Me: Yes, wynkoop AT wynn.com

Headhunter: That is what I used and it bounced back

Me: Why did it bounce back?

Headhunter: I do not know it does not say, it just says it bounced back

Me: I am sure it says something. Can you read it to me?

Headhunter: No really it does not say anything.

Me: Can you please read the bounce back message and use your scroll bar to scroll down as you read it.

Headhunter: It says "Rejected - IP 10.0.0.54 not authorized to send mail for example.com"

Me: Ok that tells me the problem is on your end. You need to check your email server and dns server and correct the error on your dns server.

Headhunter: I never had this problem before. I send lots of email it must be you. Please can you give me a good email address?

The above is a conversation that I seem to have at least once a week. The problem is not always SPF, sometimes it is email from an IP that is in a block list, other times the email trips a bayesian filter or is rejected for a combination of reasons. I run a very tight spam filter at wynn.com that pays strict attention to the RFCs that govern SMTP and DNS.

Usually I offer my services to the company in question to fix their problem for them. Most of the time the recruiting firm has no real technical talent and while they may be happy to place an expert out in their name, they are scared to have that same expert fix the recruiting firm's own computing problems.

Today I decided to go the extra mile and send a detailed report to the recruiter on the off chance that I might get the gig to fix their email infrastructure.

So why was the email to me rejected. The story lies in DNS text records. Please note the domain and ip information has been changed to protect the guilty.

[wynkoop@wa3yre ~]$ nslookup
set type=txt
example.com
Server: 199.89.147.3
Address: 199.89.147.3#53

Non-authoritative answer:
example.com text = "MS=ms48171289"
example.com text = "v=spf1 include:spf.protection.example.net -all"

Authoritative answers can be found from:
example.com nameserver = ns25.domaincontrol.com.
example.com nameserver = ns26.domaincontrol.com.


So the above tells we have to take another look. This time we need to lookup spf.protection.example.net.

spf.protection.example.net

text = "v=spf1 ip4:192.168.20.0/24 include:spfa.protection.example.net -all

So now our smtp server knows that it can accept example.com mail only from 192.168.20.0/24 or from what ever spf.protection.example.net expands to....yep you guessed it they had an endless loop in their SPF record, but wait as they say on TV "That's not all! You also get this lovely SENDER/SPF mismatch at no additional cost".

The ASSP logfile teaches us this:

Mar-22-15 18:40:32 64032-18820 10.0.0.54 to: wynkoop@wynn.com [spam found][blocked] -- SPF fail -- [[hot linux admin contract]] -> /usr/local/assp/spam/64032-18820.eml;

If we examine the line above we see the connection came in from 10.0.0.54, not any system in the 192.168.20.0/24 net as the SPF record indicates. Further the SPF record tells SPF checking MTAs that if you get email claiming to be from example.com and it is not from 192.168.20.0/24 do a hard fail of the message.

So my mail server was doing exactly what the folks at example.com had instructed it to do by way of their SPF entry (the -all means reject if not from a host specified in this record).

Do I find it odd that the gentleman on the phone told me that I was the only one he ever had this issue with? No, not really. Most mail servers on the internet today are improperly configured in one way or another, and another common misconfiguration would mask this problem. Many mail servers either silently drop spam, or silently put it into a spambox for the user. Either way the email is not delivered to the user and the sender gets no feedback to let them know there is a problem. The sender just thinks he was ignored. When I told the recruiter about this second common misconfiguration he exclaimed "That must be why so many people seemed to be ignoring my email this past week".

In more than 30 years of having an internet connected email server I have never found a delivery problem to be caused by my end of the path, but the number of delivery issues has been on the rise. I feel this is largely because people do not expose themselves to the RFCs and figure, well if it seems to work it must be OK. In many cases smaller companies, like the one in question here, do not have staff on hand that understand the technology and they outsource their email hosting to ---SOME---BIG---COMPANY---IN--THE-CLOUD--. There is a very good chance that ---SOME---BIG---COMPANY---IN--THE-CLOUD-- does not have fully competent staff. There has been a real trend in IT to eliminate higher paid senior engineering staff because "most of the time it just works". As a result subtle problems can arise that the inexperienced or improperly trained technical staff will never notice.

If you use email or DNS and do not have your own expert on staff to assist with assuring your servers are properly configured I suggest you hire a consultant for a couple of days to get a health check and take corrective actions if needed.

Remember no SPF record is better than a BROKEN SPF record, if you want your outbound email to be delivered.

-Brett
wynkoop--at--wynn.com
917-642-6925


My first FreeBSD Port - sws [shell web server]    
Well at long last I did it. I contributed a piece of my code to the FreeBSD ports system, sws. Sws is a small web server for static content written in /bin/sh which should run on any POSIX system.

I first released it on Freshmeat.net with a pointer to it's self served home site at http://prd4.wynn.com:8080/ back in 2005 or 2006. Sometime a couple of years ago someone suggested I roll it into NetBSD pkgsrc, but as I have not had a NetBSD box under my control in a couple of years I decided, what the heck I have a bunch of FreeBSD boxes to test a new port on, so FreeBSD it was.

A quick look at the FreeBSD Handbook and a couple of trial runs and I was ready to submit the port. It was accepted and you can now find it in /usr/ports/www/sws on any FreeBSD system with a current version of ports.

If like me you are running FreeBSD on arm you can visit http://pkg.wynn.com/ which is a small arm box serving packages for arm via sws.

I know there are about 100 installs of sws at a government agency, because I installed it for them, but I know of no others in the wild even though it had some followers on FreshMeat before they shut down. If you are using sws somewhere I would love to hear about it!




Back to BSD    
I have a client who I first met 20 years ago. At that time I was part of the technical staff at BSDI. My duties at BSDI were a mix of email and phone support for our customers, development work, QC of our OS releases. At some point as the internet boom started to ramp up we started getting customers asking for professional services. Since I had a consulting background I became the guy who ran consulting services for BSDI. This is how I met Vince.

At the time BSDI had just folded services for netware into BSD/OS and was selling a netware to internet gateway product. With my consulting background and my time spent on the netware project when we got a call asking if we would send someone to Long Island to install our internet server and netware gateway product the answer was of course we would. Heck I was only 60 miles away.

Vince eventually migrated his companies network from netware to tcp/ip and BSDI had failed, but he was so impressed with BSD that he asked me to update his gateway/mailserver/dns-server/webserver box to the then current FreeBSD and to help migrate the company over to tcp/ip. I was happy to do this.

At some point Vince started his own little company and again asked me to setup a FreeBSD box to take care of his firewall, mail, DNS and webserver needs. In short order I had him up and running on some vanilla x86 box he handed me and all his essential services were operational. This also gave me an off site DNS slave for my zones.

To my surprise one day zone transfers for his zones to my dns server stopped and his DNS server was not responding to queries or ssh attempts. I rang Vince up and said he needed to take a look at the machine and see what the console said and we needed to get his box back up. It was at that time he told me he replaced it with Microsoft Server 2008 and MS-Exchange. His stated reason, so he could manage it himself. It sounded odd as his only management ever consisted of making new user accounts or deleting old accounts, all of which was handled via the WEBMIN gui. But I wished him well with his new setup and changed my own DNS zones to no longer reference his now less functional name server (he could not figure out how to be a Slave for me).

Much to my surprise last week Vince called me and asked if FreeBSD would run on a PowerPC Macintosh. When I asked why he quickly told me he wanted to go back to his setup of a few years ago with a BSD box doing his email, web, and DNS service. It seems that the Microsoft option did not work out for him, then he moved to hosted services, only to come to the realization that he already had an internet connection at his office and he could just go back to what had worked for many years.

Vince was surprised when I said no need to put the Mac G5 on FreeBSD. He had no idea that his Mac was already running what amounts to FreeBSD user-land on top of a Mach kernel. Needless to say I set about sshing into his G5 and building/installing what was needed to get him operational.

The first step was finding the OS-X devtools for Mac OS X 10.5. Lucky for me they were still on the Apple Developers Site, so a short time later I had them installed and had started to install Macports to handle most of my third party software needs. Macports comes to us from Jordan K. Hubbard (a big contributor to FreeBSD) and a small team of like minded folks who saw the need for OS X to have something like ports or pkgsrc. My thought on Macports after using it for years is that it is even better than either ports or pkgsrc.

So it looks some small companies can learn from their mistakes and return to using good solid Free Software. What advocates of Free Software have to do is keep lines of communication open with their clients, or company management so when the opportunity presents the moment may be seized and good solid standards based Free Software Solutions can replace the expensive and often times insecure commercial offerings.



Lessons From a Gmirror Failure    
FreeBSD has some great tools for raid and disk management in modern versions. One of these great tools is gmirror. I have been using gmirror since about FreeBSD 6.2. I switched to software mirroring at that time from dedicated hardware raid controllers after a bad experience where I lost the controller, could not get an exact replacement so I found my client's raid array was now so much useless unreadable data.

The beauty of gmirror is that it is controller independent and if you lose a controller or disk the system will keep running. If your hardware failure means you have to move the disks, or maybe just one of the disks, to another system it should just work. It usually does - as there is no dependence on specific controllers with their own special low level secret formatting magic preventing use of a different controller. Often even contolers from the same vendor do not work with disks prepared on a different model. I have been bitten more than once by Dell boxes that should have been the same!

I can demonstrate the value of gmirror with an incident at a client a few years ago. The IT director called me about his Chicago mail server. It seems that his Chicago tech had just called to tell him that smoke was coming from one of the disk drives on the mail server. I had set the box up a few years prior using FreeBSD 6.x with a pair of gmirrored drives that were hot swap. I told him to pull the smoking drive and replace it with the same or larger size disk and call me back. When they called back I logged into the box and ran the following commands:


# gmirror forget gm0
# gmirror insert gm0 ad2
# gmirror rebuild gm0 ad2

In short order they were back to running on both disks with zero down time.

The above is typical of my experience with gmirror on many systems at many locations.

Over all I have been a happy gmirror user for years, but just like raid5, gmirror has it's own "hole". I discovered this hole on my own main system.

In reading my nightly email from my various systems I noted that my FreeBSD 7.x machine had lost one of it's disks in a gmirror pair. No problem, I thought. I shutdown the box (no hot swap) and pulled the bad disk. I had an identical replacement disk on the shelf, so I inserted it and after rebooting ran the same series of commands as noted above. Just as the resilver operation got into the 98% range I started seeing hard read errors on the console of the system. Eventually after all re-tries were exhausted I saw that the gmirror rebuild failed. I tried several more times to rebuild the mirror and at the same time consulted on-line resources. It seems that since the remirror process is a bit for bit disk copy, just as if we were doing


dd if=/dev/ad0 of=/dev/ad1

it will always fail if there are read errors on the source disk. My conclusion was that a part of the disk that I was not using yet had lost some of its rust and would be impossible to read for a rebuild. I was caught in the gmirror hole. My data was all there and intact, but the only way to recover would be to dump the entire disk with something like tar in single user mode and then boot off of a live CD format a new disk and restore from the backup. I was looking at an extended down time while engaging in this process. While I could bite the bullet and accept the down time I needed to find a better way. I consulted with others long in the tooth with FreeBSD and gmirror and each person who had run into this situation said the same thing. Dump, new disk, restore, second new disk, gmirror setup again. About this time I was kicking myself for not running a 3 disk mirror instead of keeping the spare on the shelf.

My failed box held my home directory and a FreeBSD jail that acted as my primary DNS server as well as my mail and web server. I did not want the DNS/MAIL/WEB service to be off line for too long, so I decided to spin off a tar of the entire disk just in case and set up a regular rsync to another box so I could restore easily if the last disk died. In the meantime I looked for a solution that would limit my down time to a few moments, not hours.

The solution came in the form of a newly configured FreeBSD 9.1 box running a zfs RAID-Z pool across 3 disks. I had been charged with recycling a Dell 1750 with three 75G SCSI disks that was pulled from a client. The client wanted to make sure I destroyed their data before the computer went to anyone else. I grabbed the very fine FreeBSD install image called mfsbsd from http://mfsbsd.vx.sk. Martin is a FreeBSD team member and his zfs install disks do a better job of installing FreeBSD than the current official install images.


I will leave much of the detail of the operation for another article, but the basics were:

o Install FreeBSD 9.1 with zfs RAID-Z

o Install my must haves (bash, screen, nload, openntpd, ezjail)

o Build a custom kernel (yeah, I still do this)

o Configure base system to be jail friendly (e.g., bind ssh to a single IP)

o Setup ez-jail following the docs

o Setup a test jail

o Hammer on the system with disk and network i/o for days to prove the system
is stable

o Configure a jail to hold the DNS/web/mail server

o Rsync the jail from the old 7.x box to the 9.1 box.

o Test the new jail on a different IP address

o Shutdown the old jail and the new jail

o Rsync the old jail to the new jail again

o Switch the IP address of the new jail

o Start new jail

o Done, with a total down time of less than 5 minutes!

So using jails saved me from having an extended downtime. Now that my jails are on a system running zfs it is possible to do zfs snapshots and send those to another system for safekeeping and quickly bring them up in case of hardware failure.

Will I use gmirror in the future? Probably, but it will be on systems that are too small to support zfs. If I have the choice on a smaller system to run with 3 disks I will probably go for graid3 instead of gmirror. That one extra disk will of course give more disk space to the total array, but my biggest hope is that with more disks I can avoid the silent 2 disk failure that I ran into with gmirror.

For now zfs will be my default if the box has at least 1G of ram. The 1750 has been up for 17 days acting as my mail/DNS/web server and also as a build machine for various things which I am testing in other jails. Contrary to what some folks have stated it would appear that zfs can run on boxes with less than 4G of ram nicely.




1 2 3